Iterative Set Expansion of Named Entities using the Web
ثبت نشده
چکیده
Set expansion refers to expanding a given partial set of “seed” objects into a more complete set. One system that does set expansion is SEAL (Set Expander for Any Language), which expands entities automatically by utilizing resources from the Web in a language independent fashion. In a previous study, SEAL showed good set expansion performance using three seed entities; however, when given a larger set of seeds (e.g., ten), SEAL’s expansion method performs poorly. In this paper, we present an Iterative SEAL (iSEAL), which allows a user to provide many seeds; briefly, iSEAL makes several calls to SEAL, each call using a small number of seeds. We also show that iSEAL can be used in a “bootstrapping” manner, where each call to SEAL uses a mixture of user-provided and self-generated seeds. We show that the bootstrapping version of iSEAL obtains better results than SEAL using fewer user-provided seeds. In addition, we compare the performance of various ranking algorithms used in iSEAL, and show that the choice of ranking method has a small effect on performance when all seeds are user-provided, but a large effect when iSEAL is bootstrapped. In particular, we show that Random Walk with Restart is nearly as good as Bayesian Sets with userprovided seeds, and performs best with bootstrapped seeds.
منابع مشابه
People Summarization by Combining Named Entity Recognition and Relation Extraction
The two most important tasks in entity information summarization from the Web are named entity recognition and relation extraction. Little work has been done toward an integrated statistical model for understanding both named entities and their relationships. Most of the previous works on relation extraction assume the named entities are pre-given. The drawbacks of these sequential models are t...
متن کاملLearning to expand queries using entities
A substantial fraction of web search queries contain references to entities, such as persons, organizations, and locations. Recently, methods that exploit named entities have been shown to be more effective for query expansion than traditional pseudo-relevance feedback methods. In this paper, we introduce a supervised learning approach that exploits named entities for query expansion, using Wik...
متن کاملAGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data
Over the last decades, several billion Web pages have been made available on the Web. The ongoing transition from the current Web of unstructured data to the Web of Data yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework) from these websites. One of the key steps towards extracting RDF from text is the disambiguation of nam...
متن کاملAGDISTIS - Agnostic Disambiguation of Named Entities Using Linked Open Data
Over the last decades, several billion Web pages have been made available on the Web. The ongoing transition from the current Web of unstructured data to the Data Web yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework) from these websites. One of the key steps towards extracting RDF from text is the disambiguation of named ...
متن کاملNetwork Planning Using Iterative Improvement Methods and Heuristic Techniques
The problem of minimum-cost expansion of power transmission network is formulated as a genetic algorithm with the cost of new lines and security constraints and Kirchhoff’s Law at each bus bar included. A genetic algorithm (GA) is a search or optimization algorithm based on the mechanics of natural selection and genetics. An applied example is presented. The results from a set of tests carried ...
متن کامل